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SUMMARY 


This paper outlines some of the research activities underway as part of the Air Force's Learning 

f 

Abilities Measurement Program (LAMP). The major goal of the project is to devise new models ^of 
the nature and organisation of human abilities with the long-term goal of applying those models to 
improve current personnel selection and classification systems. As an approach to this ambitious 
undertaking, we have divided the activities o! the project into two categories. The first category is 
concerned with identifying fundamental learning abilities by determining how learners differ in their 
abilities to think, remember, solve problems, and acquire knowledge and skills. From research already 
completed, we have established a four-source framework that assumes that observed learner 
differences are due to differences in pr<xcssing speed : processing capacity: and the breadth, extent, and 
accessibility of conceptual knowledge and procedural and strategic \kills. The second category of 
research activities is concerned with validating new models of learning abilities. To do this, we are 
building a number of computerized intelligent tutoring systems that serve as mini-courses in technical 
areas such as computer programming and electronics troubleshooting. A major objective of this part 
of the program is to develop principles for producing indicators of student learning progress and 
achievement. These indicators will serve as the learning outcome measures against which newly 
developed learning abilities tests will be evaluated in future validation studies. * j ^ 
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I. INTRODUCTION 


Considerable headway has been made during the last decade in our understanding of human 
cognition. This has led to speculation that it is only a matter of time before an improved technology for 
gauging individuals' intellectual proficiencies will be developed. The stakes are high: Psychological 
testing of cognitive proficiency is presently widespread in industry, the schools, and the military. 
Improved tests would have a profound economic impact in cutting education and training costs and 
enabling a more efficient and fair system of personnel utilization. Although the concept of 
psychological testing must certainly be considered one of psychology's true success stories, it is also 
primarily a past accomplishment. Systematic studies of predictive validity have shown that today’s 
aptitude tests are no better than those available shortly after World War II (Christa), 1981; Kyllonen, 

But even if it is agreed that forces are conspiring to usher in a new era of cognitive testing, there 
still is considerable debate on exactly what form these new cognitive tests will take. On one side of the 
debate, some argue (hat what cognitive psychology has to offer is a rationale and a methodology for 
measuring basic information processing components (Dcttcrman. l'),Xb; Jensen, 1 1 )<S2; Posner & 
McLeod, I'tXJ). According to this view, the cognitive test battery of the future would consist of 
measures of speed of retrieval from long-term memory, short-term memory scanning rate, probability 
of transfer from short- to long-term storage, and the like. On the opposite end of the debate are those 
who suggest that the fundamental insight of cognitive science is that cognitive skill reflects primarily 
knowledge rather than general processing capabilities. This perspective has led to calls for testing 
intermingled with instruction, testing aimed at measuring what students know and what they have 
learned in the context of their current instructional experience (I .mbretson, in press; Glaser, 19X5). 

This has been called steering truing (Lesgold. Bonar. & Ivill, 1W7) or apprenticeship li sting (Collins, 
l')X<>), Between these positions are those who propose new kinds of cognitive tests that are not 








radically different from existing ones, but perhaps richer and more diverse in what they measure (Hunt, 
1982; Hunt & Pellegrino, 1984; Sternberg, 1981b). 


In this paper, we provide a status report of one ongoing program of research, the Learning Abilities 
Measurement Program (LAMP), that has been concerned with developing new methods for measuring 
cognitive abilities. Wc discuss some of our early thinking on the implications of cognitive psychology 
for testing, and how we have adjusted our ideas in light of data collected in our cognitive abilities 
measurement (CAM) laboratory. We conclude with a brief discussion of CLASS, the Complex 
Learning Assessment Laboratory, the setting in which wc intend to validate the new tests. 1 

II. COGNITIVE THEORY AND APTITUDE TESTING 

The idea of grounding psychological testing in cognitive theory is not entirely novel. During the 
1970s and 1980s, the Air Force Office of Scientific Research (AFOSR) and especially, the Office of 
Naval Research (ONR) supported a number of basic research projects which had the explanation of 
individual differences in learning and cognition as a central goal. This research largely concentrated on 
the analysis of conventional aptitude tests, probably for two reasons. First, analysis of aptitude tests is 
important in its own right, as an attempt to determine what it is that such tests measure. But, second, 
and perhaps more importantly, aptitude tests can be viewed as generic surrogates for tasks tapping 
more complex, slowly developing learning skills. It is difficult and extremely expensive to identify and 
analyze the information processing components associated with the acquisition of computer 
programming skill; so goes the argument: It is far cheaper and more efficient to analyze the seemingly 
more tractable components of some aptitude lest, such as an analogies test, that predicts success in 
computer programming. And the fact that tests do such a good job in predicting training outcomes can 
be taken as evidence that pretty much the same cognitive components are involved in both test-taking 
and learning. 


This paper does not review the research accomplished by \\ illiam Tirrc and I inda Flliotl 
concerning individual differences in text comprehension Readers interested in this area are rclcricd 
to Tirrc and F.lliolt (l'>87). 








The wave of aptitude research that was motivated by these considerations did not lead directly to 
improvements in existing aptitude testing systems, however. A number of new methods and 
techniques, such as cognitive correlates analysis (Hunt, Frost, & Lunncborg, 1973) and componcntial 
analysis (Sternberg, 1977), were developed for analyzing aptitude tests, but the application of these 
methods did not suggest how the tests themselves might be improved. There have been suggestions 
that cognitive tasks exported from the experimental psychologist's laboratory might somehow be used 
to supplement or even replace existing aptitude tests (Carroll, 1981; Hunt, 1982; Hunt & PellegTino, 
1984; Pellegrino & Glaser, 1979; Rose & Fernandez, 1977, Snow, 1979; Sternberg, 1981b), but after 
almost 10 years, the research still has not been carried out to an extent sufficient for determining 
whether this is really feasible. 

Probably the reason cognitive-based aptitude research has not translated already into better tests is 
that this has not been a primary goal of the research. Indeed, if the creation of better tests had been 
the primary goal, the approach of analyzing and decomposing existing tests does not seem very 
promising. If such research efforts were completely successful, ’ if the research turned out better than 
anyone's wildest expectations,'' at best, new tests would simply duplicate the validity of existing tests. 

III. LEARNING ABILITIES MEASUREMENT PROGRAM (LAMP) 

In contrast to some of the aptitude research projects previously discussed, our own work in 
connection with Project LAMP has from its inception been focused on the goal of developing an 
improved selection and classification system. Our current efforts fall into two categories. First, we are 
continuing to model basic cognitive learning skills and their interrelationships, and to explore different 
methods for measuring these skills. Second, we have more recently begun thinking seriously about a 
svstem for validating the new cognitive measures. The system involves the extraction of learning 
indices, both on short-term (1 hour) and long-term (1 week) learning tasks, that will serve as criteria 
against which the new cognitive measures will be validated. Although we have not yet collected data on 
the long-term learning tasks, we have set up the laboratory, which consists of 30 computerized tutoring 














stations. In the remainder of this paper, we discuss these two categories of ongoing LAMP research. 
We begin with a discussion of studies that have attempted to measure cognitive skills. 

Modeling Cognitive Skills: The Four-Source Framework 

Much of our work on identifying basic learning skills has centered around w hat we have called the 
four-source framework (Kyllonen, lv<S6). This is the idea that individual differences in a wide variety of 
learning and performance tasks are due to differences in four underlying sources: (a) effective cognitive 
processing speed; (b) effective processing capacity, and the general breadth, accessibility, and pattern of 
one's (c) conceptual knowledge and (d) procedural and strategic skills. Figure 1 illustrates these 
relationships. 

We refer to the knowledge and skill components of this model (components |c] and jd]) as enablers, 
in the sense that any learning or performance task can be characterized as consisting of a necessary set 
of knowledge and skill prerequisites. We refer to the processing speed and working memory 
components of the model ([a] and [b|) as mediators, in the sense that these components mediate the 
degree to which the learner or problem-solver is able to use his or her knowledge and skills effectively. 
We have found the four-source framework to be useful in organizing our own as well as others' 
research and in monitoring our research progress. Further, although we have not yet applied it widely 
in this fashion, we expect that the system will be useful for task analysis purposes. 

Thus far, most of the research we have accomplished in connection with the four-source proposal 
has been concerned with (a) improving the way in which we measure cognitive skills and (b) 
determining the dimensionality of the skills and subskills embedded within the four-source model. We 
now turn to a discussion of the four components, in turn. 

Processing Speed 

Considerable research on individual differences in cognition over the past 10 years has been 
concerned with determining the relationship between processing speed and performance on complex 
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did just th.it and found evidence for both separate reasoning, quantitative, and verbal processing 
factors, and a higher-order general processing speed factor. Interestingly, we lound that although 
processing speed scores were quite reliable, at least within session, they were not related to aeeuraev 
scores on the same tests. Timed versions of the tests thus mix these two separable components of 
performance in yielding only a single store. I herc are problems with this approach to testing the 
dimcnsionalitv question, -uch as hov. to allow lor speed-aceuracv trade-off. what to do with response 
times when the person gue-.ced incorn etlv. and so iorth. But a more substantive problem is that 
although the findings are suggestive, they fall considerably short ot revealing much about the processes 
that produced diem. 

Thus, in subsequent work we have restricted our locus (and employed a narrower range of tasks) in 
the hope ol achieving a better process oriented understanding of the generality question. In these 
studies, we attempted to idcnlilv processing stages, then measure the duration of those stages lor 
individual subjects, then compute the stag; inter correlations. The procedure is best illustrated by 
example. In the fust study (Kvlloncn. B'S 7 ;. vn administered a series ol tasks that required subjects 
simply to determine whether two words presented (e g.. happV'lo.se) were similar or dissimilar with 
respect to valence Happv would be eonsideisd a positive-valence word; lose would be considered a 
negative valence word. We presumed that a decision on this task was executed after a series of 
processing stages. The subject begins by ,•/:<< >n:m; one of the words, then encoding the second word. 
The result ol the encoding process is that a symbol representing valence is deposited in working 
memorv lot cash word. The subj.vt then . 'ompjn v those- svmboN The result o/ the comparison 
process is an implieii asset lion that i he svmbols me either the same or dillerenl. A decision process 
then take' the comparison result und translates it into a plan for the execution ol the motor response. 

A response process tin n execute the motor response. I hrough the method ol pre cueing, which has 
been used with ome success in separating pt, .cess components on other react ion time tasks (e.g., 
Sternberg, l't’t). we were able to indcpcndcntlv estimate the duiatiim ol each ol these processing 






















Wc also administered two other versions of the task in which the only difference was that subjects 
were required to decide whether (a) two digits were the same with respect to oddness or evenness, or 
(b) two letters were the same with respect to vowclness or consonantness. The data analysis addressed 
two questions regarding generality. First, were parallel measures of stage duration (estimates derived 
from separate blocks of items) more highly inter-correlated than correlated with other stage durations? 
This is a direct test of stage independence. Second, were stage durations estimated from tasks with 
different content (words, digits, or letters) more highly inter-correlated or were alternative stages taken 
from same-content tasks more highly inter-correlated? This is a direct test of the relative importance 
of content and process. Although the analyses were rather complex, the general finding was that 
processes were somewhat independent, and also general across contents. That is, fast encoders were 
not necessarily fast comparers, but fast encoders on the word task were also fast encoders on the dicit 
task. 

One of the problems with this approach to studying dimensionality is that it relies on a model of 
performance that assumes serial execution of processing stages. In our more recent work (Kyllonen, 
Tirrc. & Christal, 1988), wc have relaxed this assumption by applying both those models that assume 
serial execution and those that do not in estimating stage durations. (We also have abandoned the pre- 
cucing technique because its validity depends on the serial execution assumption.) Following 
Donaldson's (1983) analysis, stage durations can be estimated in two ways. Assume an ordered set of 
tasks, each of which can be characterized as requiring a proper superset of the processes of its 
predecessor. For example, the following set of tasks, each of which requires processing a pair of words, 
might be characterized this way: reaction time, choice reaction time, physical matching, name 
matching, semantic (meaning) matching. That is, reaction time consists only of a reaction component; 
the choice task adds a decision component, the physical matching task adds comparison, name 
matching adds retrieval from long-term-memory, and semantic matching adds search through long-term 


memory. 
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One can estimate each of these stage durations either by subtracting latency on the predecessor 
task from latency on the target task (the difference score model), or by statistically holding constant the 
duration of all predecessor tasks (the part correlation model). The two models employ differing 
assumptions about the relationships among task components. The difference score model assumes 
nothing about the relationship between the duration of the target component (e.g.. compar.son) and 
the duration of the predecessor task (e.g.. choice reaction time). Thus, this correlation is a parameter 
to be estimated. But the cost of this flexibility is the assumption that the duration of the target 
component (e.g.. comparison) remains constant, regardless of whether the component is embedded in 
the physical matching task, the n.-.ine matching task, or whateser. Conceptually there are two problems 
with this assumption. Consider the reaction component. It may be that reaction is rapid when nothing 
else is going on, as on the simple reaction time task. Kit -.low when it follows complex processing, as on 
the semantic matching task. Or it could be the opposite, due to parallel processing: Reaction appears 
slow on the simple reaction lime task because it is the only process executing: but on the meaning 
identity task, the reaction begins before decision ends, and thus appears last (as is specified in process 
cascading models, McClelland, 197'>). 


The part correlation model avoids this assumption and allows for variability in stage durations over 
different tasks. This is represented as freedom in the regression weight associated with stage duration 


to differ from 1.0. But in order to achieve this flexibility, the p.ut correlation model must compensate 
with an assumption not required with the difference score model. In the part cot relation model, it is 
assumed that the duration of the target stage is uncorrelated with the duration of the predecessor task. 
For example, the duration of the comparison component in the context of the phvsical matching task 
would be assumed to be uncorrelated with response time on the choice reaction time task. 


Which of these sets of assumptions is correct, those associated with the part correlation model or 
those associated with the difference score model' 1 It is not possible to tell, but it is possible to employ 


both models and then to be confident of relationships only when the models agree. 








We took this approach in attempting to estimate the relationship between processing stage 
durations and performance on a vocabulary test, and also on a paired-associates learning task. 
Vocabulary is an interesting test case because it is a good measure of general intelligence. The current 
view is that breadth of word knowledge reflects efficient learning processes in inferring word meanings 
in context (Marshalek. I‘)81: Sternberg & Powell. l‘*S3). An additional motivation for looking at 
vocabulary as a criterion was that a considerable literature has evolved from Hunt and colleagues' 

(Hunt et al.. l l >73) early finding of a relationship between the duration of the retrieval stage (as 
estimated bv the dilference between response time on the name and physical matching tasks) and 
verbal ability. 

Contrary to Hunt et al. and other previous work, however, we did not find much of a relationship 
between retrieval speed and vocabulary (r - P. V - 710), hut we did find a strong relationship 
between scorch speed and vocabulary (r -- 4 ‘t). Subjects capable of quickly accessing semantic 
attributes of words, controlling for how quickie they did other kinds of information processing, had 
larger vocabularies than did other subjects. 

We found a similar relationship between processing speed and learning, but only in particular 
circumstances-namcly. when study time on the learning task was extremely short (.5 to 2 seconds per 
pair). The component analysis again made it possible to isolate the semantic search component, as 
opposed to other processing speed components, as the one consistently most critical in determining 
learning success. Over a number of studies (which varied on block si/e, recognition vs. recall 
responses, etc.), the correlation between learning success and response time on the meaning identity 
test, controlling for (or eliminating by subtraction) response time on other information processing tests, 
ranged from r = .30 tor = .50. In some studies, other information processing speed components 
predicted learning outcomes, but onlv inconsistentIv 

We currently are engaged in two lines ol extension to the processing speed work. One is motivated 
by the idea that information processing speed mav be closely tied to working memory capacity insofar 
as both measures reflect the dvnamic activation ic c! of a memory trace (Woli/, 1' J S7). An intriguing 
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defined by the Numerical Operations subtest (r - 75), but it also was significantly loaded by latencies 
from the Mental Arithmetic Test and the Sunday Tuesday Test (r > .50). The basic pattern of results 
found here has been corroborated in a recently completed follow-up study. 

! ..keii together ihe tesu'is sugge-t the mvolu •nent «•! both domain kaowlc dge (quantitative and 
\ il-al) and a domain independent working ir.i morv in memory test performance. In addition, it 
apjv.i •. Irom the data .ner the two stuu. s that the Working Memory factor subsumes the Reasoning 
factor. That is, individual diftcremes m : e. ..wing proficiency may be due entirely to differences in 
corking memory capacity. v brutal nolo that the factor mi which all the reasoning tests in the battery 
'•sided hi, hr- i - a Working M-::.orv t actor rr ti’at the test that defined it, Alpha Recoding (r .(> 8 , in 
the lolios. up study), docs not appear to involve reasoning per sc but clearly depends on working 
riK m >:v e.i;mu 

Recently. we h.ivc begun iti'.C'iig.iii '.<■ ar, alternati-c to the processing work-pace model which is 
based or. ,i dilicrcnl > oiueptu.ili. atioi .>( work in; memory The activation capacity model, based 
prim u.Is on Vnicr ons (Vk\, \< "1 • theory o.iaic wot king memory, not a- i m narate short term 
-tor, but rather, as a state of fluctuating activation patterns characterizing traces in long-term memory. 
Aei. iding to this theoiv, long term memory is a net\-mk of traces, each characterized by resting 
.- . .s 1 1 1 , a. leu'i■ 1 tan r>ceome a. ’iy.'leC n n , ho. K - nut: the focus o! attention, oi art: linked to 
the incus ol attention, then tack into a tat- ■: deactivation as other traces move to the center ol locus. 
'Aork»ng memory is said to I*, a "matter of icercc' rather than an all or none state, in that at any given 
mom, ni, a l-.icc might lie the iocu- o! aiten ; theieby be a' a peak .uli-atioii level) or it might 
1 m. continuously fading from attention it. for c sample, it was the focus a tew seconds earlier 

I h: ,irpl:calio»i -'t this mode! ha- fsui; •,! u; i.sts of a iking memory capacity that look quite 
di-t ric 1 fr,">, liio e i>ase..l on the proccvsn, a- kspace model, figure 5 dliisto.te' a test developed by 
Woltz ( P-S-, t - reflect individual *ltflorcn« - In activation capacity In this test, subjects are presented 
,i •-cr 11 ->f word p urs ar',,1 are re.• !■ ;ei mini whether ■ r not the s-u.l n vn-inv:ii-. 

('evasion dIs. .cords ar. repeated one, two. , :t. or e ight items later As figure .5 shows, mean 


























response time is 1265 ms if neither of the words was shown before, but that time is reduced by 191 ms 
if one of the words was encountered on the previous item, and by 107 ms if one of the words was 


encountered eight items ago. The interpretation is that the word encountered even eight items ago is 
still more highly active than it would be at Us tiue resting state, ami therefore is processed faster. 
Woltz argues that individual differences in the response time facilitation effect reflect differences in 
activation capacity. 


(iiven that we can define we.:king memory capacity in two distinct ways, an important next question 
is: What is the empirical relationship Ixnwee:: the two kinds of measures, and even more importantly, 
what is the;; relationship ! > learning? Cognitive analyses of learning tasks (Anderson, 1987; Anderson 
& J elTri es, 1985), such as mathematics learning or learning a computer programming language, suggest 
that the line', mg factor in teaming >\ .he wording me more bet Pence k Rut the proof of this assertion is 
often rather theoretical, based on a r tiiona! analysis oi learning task requirements, supplemented by a 
formal computer simulation of learning processes An individual differences analysis of the role of 
working memory in learning can be a useful supplement in this kind of formal analysis, and is a fair test 
of the theoretical claim (I nderwood. 1975). Thus we have recently begun investigating the 
relationship l>c tween working memory capacity (as nieaMiicd t . tests such as those displayed in 
figures 7 and n) ar.d performance in realistic learning context- We cut rent ly are investigating the 
acquisition ol eh i.troni, - troubleshooting (Kyllonen, Step-in r- . A Wo!l/, 1988) and computer 
programming skills (Kyllonen. Sou!;’. <v Stephens. 19,581. and other procedural learning task- (Wolf/. 
1987;. In all eases, we find that working nv. mon, a irdieatcd by both the pro- essing workspace and 
activation capacity measures, is a strong predictor ,.| t.- .•r , a,> oiikouk The ■- analyses are beginning 
to clarity our unJr refunding ol w. rking tn-:: ory Fhe e i I;-. - also suggest th it the particular test- id 


working memory capacity that • •. the ids dev, lop d iTigvre- 7 and 'i are solid candidates for 

inclusion in future u • • ing batteries. 



















































i'.-r. u.vJ ■•••« a sir bar nurp.we •. I. mdnncr. v.'&i). Vi e arc uirrentiy using the sentence verification 
technique for tracking the •u.curnuianou < ‘ d-rclarativc knov/ledge during the course of short (45 
min'.!' :-..-;.r - .!ctir>r>id episodes in ' rrou.c prog! ..ruining i is. yiloe.cn, Sou'e, die Stephens, 1988) and 

electronic: troubleshooting (kylioncn, S. 'phens, g- Woltz. d'H-S), 

L.en the irsa-.Ui't’;n» *»* e 1 hi depth an J b’.adih dimension. 1 of knowledge may benefit from recent 
work in cotni'iw 'civnc*. '• tie most ir:u'*'<<tivt tee n; devr! pments in probing declarative’ knowledge 
i: i,c hem sU ii by r.:c a.cr.er : . oi.eeiu. .■•iih achvw'ueiil testing (rredcriksen. Lcsgold, Glaser. 

Miaf*'., in pr s, Glaser. . ■ >?.i. «.: I ::i;\e in prev ;i.i,.'r:ei, 1085; Lesgold et al., 1987). Glaser et 
al. poi ; t o.!' that cnrient n\ bnpicath. ' alterua'ici rmitiple-chcice tests, suffer two key 
d'.i«b. el.;, i : rs:.'he aiti. r.ia:i runno-. ■; ibiv m\ ;:n; aviate all the possible misconceptions a 
student coui.‘. . . . e"-w and tt-a ■ ■ omitc. deign -stic utility. Second, the alternatives may give away 
t'-e ans..cr, as has Ker. show: •• • dt.. 

(i laser et al. eii-c »•••. the p -n .■ e apj;r >n be-, to knowledge assessment, which in 

c ipi.isi .*ciprai'.iriiv or .; . j / a?-a . ■ ■. .V..- , »i < .;-,>col' extracted from students struggling 

with nc.v material or up:.’;-. f ■>. h,.v. A' .-J; •..<;>•. • !. -.’ysis of these kinds of protocols has 
p ; .-t .i a it i-i t !e in t 1 .... . ! . ■ ;.i'-".' .i •< r tiuoss; .s_ Simon. 1984) and serve-- as 

V p. ;■ ,e dri-. .! e<’ig’irrf ’-1 task analysis. 

i";.e >• -..i.'. a. r .. . ,nc . i.rie is expense: i'rotocol analyses are 

i.i"lie )n both sui’ieri and >■; • :. . ’■ tin e ■ i . pp.opi.jie lor inclusion in a test 
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strategy. Second, the hierarchical arrangement can closely mirror the way in which a student is 
thinking about a problem, in a kind of top-down fashion. 

Thus far, this approach to probing an individual's knowledge has been employed in one of the 
CLASS tutoring systems. Bridge (Bonar & Cunningham, 1986), which teaches learners how to 
program in Pascal, presents general programming problems to be solved. At the top level (the first set 
of questions), the alternatives are general categories or general approaches to the problem (e. g., "add 
something together" or "keep doing something"). Once the student selects a category, he or she is 
presented a list of alternatives that refine the category selection, and so on, until a fully specified 
answer is selected. From pilot testing using Air Force subjects, the method has proved general enough 
to accommodate the vast majority of potential responses to particular programming problems; 
therefore, the approach seems highly promising as a way of assessing knowledge status in the student. 

To summarize, although wc have not yet fully explored the domain of how to probe a learner's 
declarative knowledge base, we have made some important initial steps. It is likely that as we begin 
further testing in the more complex tutoring systems environments, the methods described in this 
section will be refined further. 

Skills 

Wc define skills or procedural knowledge as it is referred to in the cognitive science literature, fairly 
informally, as any unit of knowledge that is typically or would likely be represented in production 
system simulations in the form of an if-then rule or series of if-then rules. Thi. is any knowledge or 
skill the student has that might bear directly on problem solving ("how-to knowledge"). Procedural skill 
varies widely along the generality dimension; at the most general level are problem solving heuristics or 
approaches, such as working backward, means-ends analysis, or persisting in the lace of uncertainty. 

At the opposite end of the continuum are very specific procedures, such as moving the cursor to 
position Id. 45 when required to delete a character at position 1?, 4' 










One fairly consistent finding in cognitive research is that although specific procedures are trainable, 
general procedures are quite resistant to modification. This finding is certainly not due to a shortage of 
attempts to modify general skills. Kulik, Bangert-Downs, and Kulik (1984) reviewed over 50 studies of 
the effects of extensive coaching for the Scholastic Aptitude Test (SAT). They concluded that the 
effects, even for long-term training, were quite small (approximately one-sixth to one-third standard 
deviation, or 17 to 34 points). The insults of Venezuela's Project Intelligence (Herrnstein, Nickerson, 
de Sanchez, & Swets, 1986) may be seen similarly as somewhat disappointing. Despite an ambitious 
project in which domain-free thinking skills were taught 4 days per week, in 45-minute lessons, for an 
entire year, the actual changes experienced on standard measures of cognitive skill (intelligence tests) 
were quite minuscule (about .3 sd). These findings should not have come as any great surprise. 
Attempts to have students transfer general problem-solving approaches to superficially distinct but 
isomorphically identical problems have repeatedly failed (e.g., Brown & Campione, 1978; Simon & 
Hayes, 1976). 

On the other hand, there is good evidence for the modifiability of specific skills, especially in 
context. Schoenfeld (1979) has shown how training in mathematical heuristics (e.g., draw a diagram, 
simplify the problem, test the limiting case) can facilitate subsequent problem solving so long as the 
instruction is wedded tightly to the domain material simultaneously being taught. Recent analyses of 
transfer of training have shown that skill transfer is excellent and quite predictable when the skills 
transferred are related at some conceptual level to the new skills (Anderson, 1987; Kicras & Bovair, 
1986). 

The implications of these two results for testing purposes are apparent. On the one hand, specific 
procedural knowledge is rather easily modifiable and therefore ought to perhaps be trained rather than 
tested for, at least in the personnel selection and classification context. Recent work on diagnostic 
monitoring (Frederiksen ct al., in press; Lcsgold ct al., 1987) shows how tests can be used to tailor 
instruction and are thus appropriate for this purpose. On the other hand, general procedural 
knowledge should have an important predictive relationship to learning ability, and it seems to be fairly 












immutable. General procedural knowledge, therefore, is an ideal capability to test for in entrance 
(selection and classification) testing. It is interesting that researchers from very diverse perspectives-- 
psychometric (Cattell, 1971), information processing (Sternberg, 1981a), and artificial intelligence 
(Schank, 1980)-have argued consistently for the importance of the ability to cope with novel problems 
as a key aspect of intelligence, and therefore as an ideal candidate for inclusion in aptitude test 
batteries. 

Do we now test for general procedural knowledge, or general problem-solving skills? As was the 
case with declarative knowledge, there certainly are in existence paper-and-pencil tests that would 
appear to tap very general problem-solving skill-Raven's Progressive Matrices being an excellent 
example. And about 7 years ago, ETS began supplementing its existing Verbal and Quantitative 
portions of the Graduate Record Examination with a new test of Analytic ability (Wilson, 1976). The 
ASVAB comes close to testing general problem-solving ability with the Arithmetic Reasoning subtest. 
This subtest consists of story problems such as ' How many 36-passenger buses will it take to carry 144 
people?" (DoD, 1984). Recall that the Arithmetic Reasoning subtest loaded highly on the Working 
Memory factor in the Christal (1987) study, which suggests an intriguing research question: What is 
the relationship between working memory and procedural skill? 

We can think of working memory capacity as mediating the development and efficiency of general 
problem-solving strategies. But an alternative view of the relationship between the two constructs 
assigns the central role to working memory. Baddeley (1987) has proposed a model of working 
memory consisting of various slave storage subsystems (for storing linguistic information, spatial 
information, etc.), along with a central executive which monitors and coordinates the activities of the 
subsidiary storage systems. Executive skill, then, is skill in monitoring one's problem-solving processes, 
adapting to changing task requirements, successfully executing general problem-solving strategies, 
allocating resources where they are needed, and more generally, changing processing strategy in 
accordance with changes in processing demands. 
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In ihLs way, the executive can be seen as the most important component of working memory. Yet, 
though we have a reasonable understanding of how the subsidiary storage systems function, according 
to Baddeley the workings of the central executive still remain largely a mystery. An important and 
exciting research direction is to begin devising means for measuring executive skill and thereby begin 
unraveling that mystery. 


Modeling Learning Skills 


Learning Skills Taxonomy 

If we can adequately measure knowledge and the various skills associated with the four sources, an 
important next step in the research program is to demonstrate the relationship between those scores 
and scores generated from a trainee's interaction with a learning task. We believe that learning should 
be expressible in terms of (i. c\, predictable from) the underlying components, but it is necessary to 
prove that this is the case. 

Much of our research until fairly recently has used grossly simplified learning tasks as criterion 
measures against which to validate the new cognitive abilities measures. For example, in the Kyllonen- 
Tirre-C hristal (1988) study, performance on various paired-associates tests were used as criteria; and in 
other studies, we have employed comparably simple, short-term learning tasks. The logic underlying 
this decision is twofold. First, we are concerned with developing rigorous models of the aplitude- 
learning-outcomc relationship; and simple, short-term learning tasks afford more control over the 
instructional environment. But second, we believe that the kind of learning involved in even these 
simple tasks is at some fundamental level the same as that involved in more realistic learning situations. 
Or, conversely, even apparently complex classroom learning can be analyzed and decomposed into a 
scries of much simpler learning acts. 

If we accept the notion that even complex learning tasks can be broken down into their constituent 
learning activities, then it obviously would be useful to specify the nature of those basic learning 
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activities. One proposal that has been useful in our work, based largely on Anderson's (1987) three- 
stage model of skill acquisition, is represented on the right side of Figure 1. The idea Ls that cognitive 
skills develop through an initial engagement of declarative learning processes ("memorizing the steps"), 
followed by an engagement of procedurali/ation processes ("executing the steps' ), then finally 
refinement processes ("automatizing the steps"). As Figure 4 shows, different performance measures 
will be sensitive to the course of skill development at various points along the way. When first learning 
a skill, many mistakes will be made, and accuracy measures will be the most sensitive indicators of skill 
development Later, when the skill is known, few mistakes wall be made, and performance time 
measures will be the most sensitive indicators. Still later, performance time will approach a minimum 
as the target skill becomes increasingly automatized, but there might still be considerable variability in 
whether (and how much) other processing can he occurring while the target skill is being executed. 


We (Kyllonen <& Shutc, in press) recently elaborated on this simple taxonomy in proposing that in 
addition to the status of the skill (i.e., whether the skill is in a declarative, procedural, or automatic 
stale, which we identified as the knonledge-tvpe dimension), learning could be classified along three 
other dimensions: the learning environment , the domain, and the learner's cognitive style. 


The learning environment specifies the natuie of the inference process required by the student: The 
simplest learning act involves role memoii/aiion. Learning by actively encoding by deduction, by 
analogically reasoning, by refinement through reflection following practice, by induction from 
examples, and by observation and discovery involves successively more complex processing on the part 
of the learner 1 he second dimension, the resulting krumiedgt'-type, as indicated above, specifies 
whether the product of the learning act is a new chunk of declarative knowledge (a new fact or body of 
tacts) or new procedural knowledge (a rule, a skill, or a mental model). The third dimension, the 
domain, refers to whether learning is occurring in a technical, quantitative domain or a more verbal, 
non technical domain Together, these three dimensions specify a particular kind of learning act. The 
fourth dimension, the learner's cognitive style, is a properly of the learner rather than of the 
instructional situation pe r sc. But we included it in recognitio of the possibility that we cannot be 
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certain on am task of what learning skill is txmg assessed unless we consider how the learner is 


approaching the task 

Our proposal, which has not in any sense been put to the test, is that the taxonomy should prove 
useful in two wavs. First, it provides a sampling space from which we may draw learning tasks The 
goal of the IAMP effort is to model learning ability using cognitive skill measures; the taxonomy 
specifies the range of learning tasks lor which we must develop adequate models. Second, in reverse 
fashion, the taxonomy specifies the kinds of micro-level learning acts that combine to make complex 
learning. This aspect provides a task analysis tool. Our idea Ls that wc can inspect the requirements of 
any complex learning situation, in the classroom or in front of a computer, and specify what learning 
acts are occurring. Given any instructional exchange, we can find a cell in the taxonomy that represents 
that exchange. 

Complex Learning Assessment (CLASS) 

One potential stumbling block for any program like ours is that it is not easy to monitor progress. 

To determine whether our innovative measurement methods arc valid predictors ol learning success, it 
is necessary to observe students engaged in learning. Two approaches have traditionally been taken 
(Inc is to validate the new tests against some criterion reflecting success in operational training, such as 
final course grade point average. The benefit of this approach is that inferences from the research are 
direct, but there are a number of drawbacks: Data collection is extremely slow, instructor quality is 
highly variable and rnav interact with learner characteristics in affecting learning outcomes, and there is 
no allowance for manipulating the learning task in any wav so as to allow "what-if questions regarding 
validity (o.g., what it the instructor encouraged more questions, would that differentially affect student 
outcomes? ”). 

I he second approach is to simplify the learning task such that it is under the expelunenter s control 
and can be administered within a single session With complete control over the learning task, one can 
ask and lest what-if questions easily. (Infortunatcly, in so modifying the learning task, the researcher 
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>' »niu't necessarily continue to assume that (he instalments shown to he valid in the experimental 
context will prove t<> he valid in predict,ng success in more realistic learning situations. 


Our solution i > tf.e validitv problem represents a amititomi c between these two positions. We arc 
c.c; i >: dcMgn:n», A gent * iiinpaV ri/cd "it 'ring .sterns '.. .sell computer programming. 
c!v< ii.-r "..uhh -hootmg. and flight engine" ina in v> hour mini-courses (I .earning Research A 
Development (.'enter pcs 7 ) in >n. u will add new mini-courses over the next several years 

I he lull .ring systems ire Kuna -k --if. ned to pr- -dnee 1 pell variety of indices ot the learner’s curriculuni 
mi. >w tedgv a.id his her ;>n c r ss in >• ai mug line iu w knowledge and skills being taught. The 
t itering s>- terns arc suffi< ilcxo ie so that it is easy to modify the instructional strategy and thus 
a'X what it quest! ins T’iu tear rung involved, however, is not trivial. It has been estimated that 1 hour 
ot tutored instruction is equivalent to approximately 4 hours of regular classroom instruction 
(Anderson. Boyle, & Reiser. I'W;: trms, these mini-courses are quite extensive. A major goal of our 
current research efforts is to use the taxonomy to generate the most expressive indices of the student's 
learning experience. 


We envision a broad range of i• -st.-aich question, that can be addressed once we begin gathering 
data with .iie-kinds of h ..::it. g ; r.diccs hirst the iridic. > can serve as alternatives to end-of-course 
achievement test ores criteria l-»r validating re.w. cognitivi aptitude tests An index such as 
“probability of reniemfvftng an m strum •nal proposition /.is ,i function of the amount of study and 
presentation lag)" is more precise and |>oUi.'i.tllv more general than a broad achievement test score 
Such a fine breakdown o! the warning xpenence ai>o p> rmi'.s enhanced analw-s .onong the indices 
themselves for example, we can begin investigating nmn precisely questions concerning the 
relationship Ixtwcen initial knowledge acquisition and the subsequent ability to turn that knowledge 
into problem solving skill, or the ability to tun< that skill with more problem-solving experience. 


I anally, developing rich proflics of an individual learner s .strengths and we akin sees in the fotm ol 
elaborate a • ■: nihlics of!. a r nmg in, rice should p» r«nit i rea.sscssnn nt < ■; the iptgj.ie treatment 
interact'".: !. \ Tl) idea i , n! a.\ Snow. P» T .'i Pioi-.if.lv. thi n.v oik ! :';'.> r.c.■! > ■. ■ M ns* at. 
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c.in tx' ir.if vvi lu the employment of global aptitude indices and global learning outcome measures 
along with pragmatic limitations on instructional variation. The tutoring systems being developed 
overcome these limitations by generating richer traces of a learner's path through a curriculum, and by 
Seine -■ nf i. etc, ilcxiMe to allow potentially unlimited variations in how instruction is presented. 

IV. Sl'MMAKY AM) CONCLUSIONS 

i ho pajx t ha- ot. lined so;,i; ot the research activities underway as part of the Air Force's Learning 
Abilities Measurement Freer on fCiMF) I he major goal of the project is to devise new models of 
the nature and organization of human abiliti-s, with the long-term goal of applying those models to 
improve current personnel selection and classification systems. 

A i: ape], ,.c i this ami dou- undertaking, we have divided the activities of the project into two 
categories. Die first category concerned with identifying fundamental learning abilities by 
ilvterm rung how icurneis differ in their abilities to think, remember, solve problems, and acquire 
knowledge and -kill. I tori r.-search alrc.-Js completed, we have established a four-source framework 
that assumes that observed iuirm r differences are due to differences in information processing 
■ •fficiencv working memory c ap wit;,; and the breadth, extent, and accessibility of conceptual knowledge 

Old ;-. du'.o m i -tr.it. gu ‘kr'is 

I he second ea! >: >v d i. - arch adobes i- concerned with validating new tmxlcls ol learning 
a’- ho. - I o do this, we ,oe budding a iiuii-bci - I computerized intelligent tutoring systems that serve 

■a , - i . . re ,c - ■ i■ i. a jre; t s h a'.. lapulc r ,-r . a. a.n r me and *. Ic c Ironu s troubleshooting. A 

niuj.-r • ; • i:- - : rins part •-I th- j rogiain i- to develop principles lor producing indicators of student 

learning j r.>;r- c- an ! ,iehi- me nl I he.e indicators will -eree as ihe learning ontoime measures 
agair. ; wfin ii lie w!v rig ve I--pc-1 ii.r. .rs.-ilie- test will >•-. evaluated in iuluu validation studies. I he 
m at -i .. o will Ik .q ; e. .ei : >. ; e,h ireg.ii . lieu. ; ’i l-.n 'trocs o' kuowli vige and skill acquisition 
i n 1 1 in -.1 mil. s that a lien ip: i e. m.-lr n ami si. e i ->;»•> on an i eotup-. e. ate lor learner 

sire nglte and ■>; .ikn--v - 
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